Machine translation using bilingual term entries extracted from parallel texts

نویسنده

  • Tatsuya Izuha
چکیده

Patent summaries are machine-translated using bilingual term entries extracted from parallel texts for evaluation. The result shows that bilingual term entries extracted from 2,000 pairs of parallel texts which share a specific domain with the input texts introduce more improvements than a technical term dictionary with 38,000 entries which covers a broader domain. The result also shows that only 10 pairs of parallel texts found by similar document retrieval have comparable effects to the technical term dictionary, suggesting that parallel texts to be used do not need to be classified into fields prior to term extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Performance of an Example-Based Machine Translation System Using a Domain-specific Bilingual Lexicon

In this paper, we study the impact of using a domain-specific bilingual lexicon on the performance of an Example-Based Machine Translation system. We conducted experiments for the EnglishFrench language pair on in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents), and we compared the results of the Example-Based ...

متن کامل

Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases

In this paper, we present a new hybridisation approach consisting of enriching the phrase table of a phrase-based statistical machine translation system with bilingual phrase pairs matching structural transfer rules and dictionary entries from a shallowtransfer rule-based machine translation system. We have tested this approach on different small parallel corpora scenarios, where pure statistic...

متن کامل

Automatic transfer rule induction from parallel corpora

Recently, many projects have been proposed aiming at automatically transforming the multilingual information available on parallel texts into linguistic knowledge useful for machine translation. This paper describes an ongoing PhD project in which the main goal is to automatically induce transfer rules and bilingual dictionaries from part-of-speech tagged and lexically aligned parallel corpora....

متن کامل

Automatic induction of translation lexicons from aligned parallel corpus

Translation lexicons are one of the most important linguistic resources for machine translation. However, this bilingual set of word and multiword correspondences requires a lot of manual work to be built. This paper describes a method to automatically build translation lexicons by extracting knowledge from PoS-tagged and lexically aligned parallel corpora. Preliminary experiments were carried ...

متن کامل

Identifying Word Correspondences in Parallel Texts

Researchers in both machine translation (e.g., Brown et a/, 1990) arm bilingual lexicography (e.g., Klavans and Tzoukermarm, 1990) have recently become interested in studying parallel texts (also known as bilingual corpora), bodies of text such as the Canadian Hansards (parliamentary debates) which are available in multiple languages (such as French and English). Much of the current excitement ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Systems and Computers in Japan

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2005